最先进的深度学习模型通常经过大量昂贵的标签培训数据培训。但是,需要详尽的手动注释可能会降低该模型在有限标签制度中的普遍性。半监督的学习和无监督的学习提供了有希望的范式,可以从大量未标记的视觉数据中学习。这些范式的最新进展表明,利用未标记的数据来改善模型概括并提供更好的模型初始化的良好好处。在这项调查中,我们从统一的角度回顾了有关半监督学习(SSL)和无监督学习(UL)的最新高级深度学习算法(SSL)。为了对这些领域的最先进的整体了解,我们提出了统一的分类法。我们将现有代表性SSL和UL分类为全面而有见地的分析,以在不同的计算机视觉任务中的不同学习场景和应用中突出其设计理由。最后,我们讨论了SSL和UL的新兴趋势和公开挑战,以阐明未来的关键研究方向。
translated by 谷歌翻译
人类在需要快速传达对象信息的游戏中显示出高级的抽象功能。他们将消息内容分解为多个部分,并以可解释的协议将它们传达。为了为机器提供这种功能,我们提出了基于原始的草图抽象任务,其目标是在预算影响下使用一组固定的绘图原始图表示草图。为了解决这项任务,我们的原始匹配网络(PMN)以自我监督的方式学习了草图的可解释抽象。具体而言,PMN将草图的每个笔划都映射到给定集中最相似的原始性,预测了仿射转换将所选原始词与目标冲程对齐的仿射转换。我们学习了端到端的这一笔触至关重要的映射,当原始草图精确地用预测的原语重建时,距离转换损失是最小的。我们的PMN抽象在经验上取得了素描识别和基于草图的图像检索的最高性能,同时也是高度可解释的。这为草图分析打开了新的可能性,例如通过提取定义对象类别的最相关的原始图来比较草图。代码可在https://github.com/explainableml/sketch-primitives上找到。
translated by 谷歌翻译
高质量的校准不确定性估计对于众多现实世界应用至关重要,尤其是对于基于深度学习的部署的ML系统。虽然贝叶斯深度学习技术允许估计不确定性,但使用大规模数据集培训它们是一个昂贵的过程,并不总是会产生与非贝斯尼亚对应物竞争的模型。此外,许多已经经过培训和部署的高性能深度学习模型本质上都是非拜拜西亚人,并且不提供不确定性估计。为了解决这些问题,我们提出了贝叶斯cap,该贝内斯cap学习了冷冻模型的贝叶斯身份映射,从而估算了不确定性。 Bayescap是一种记忆效率的方法,可以在原始数据集的一小部分中进行训练,从而通过为预测提供了校准的不确定性估计,而没有(i)妨碍模型的性能和(ii),从而增强了预审预学的非bayesian计算机视觉模型。需要从头开始昂贵的型号。所提出的方法对各种架构和任务不可知。我们显示了我们方法对各种各样的任务的功效,这些任务具有多种架构,包括图像超分辨率,脱蓝色,内化和关键应用,例如医学图像翻译。此外,我们将派生的不确定性估计值应用于在自主驾驶深度估计等关键情况下检测分布样本。代码可在https://github.com/explainableml/bayescap上找到。
translated by 谷歌翻译
Artificial Intelligence (AI) and Machine Learning (ML) are weaving their way into the fabric of society, where they are playing a crucial role in numerous facets of our lives. As we witness the increased deployment of AI and ML in various types of devices, we benefit from their use into energy-efficient algorithms for low powered devices. In this paper, we investigate a scale and medium that is far smaller than conventional devices as we move towards molecular systems that can be utilized to perform machine learning functions, i.e., Molecular Machine Learning (MML). Fundamental to the operation of MML is the transport, processing, and interpretation of information propagated by molecules through chemical reactions. We begin by reviewing the current approaches that have been developed for MML, before we move towards potential new directions that rely on gene regulatory networks inside biological organisms as well as their population interactions to create neural networks. We then investigate mechanisms for training machine learning structures in biological cells based on calcium signaling and demonstrate their application to build an Analog to Digital Converter (ADC). Lastly, we look at potential future directions as well as challenges that this area could solve.
translated by 谷歌翻译
Building a quantum analog of classical deep neural networks represents a fundamental challenge in quantum computing. A key issue is how to address the inherent non-linearity of classical deep learning, a problem in the quantum domain due to the fact that the composition of an arbitrary number of quantum gates, consisting of a series of sequential unitary transformations, is intrinsically linear. This problem has been variously approached in the literature, principally via the introduction of measurements between layers of unitary transformations. In this paper, we introduce the Quantum Path Kernel, a formulation of quantum machine learning capable of replicating those aspects of deep machine learning typically associated with superior generalization performance in the classical domain, specifically, hierarchical feature learning. Our approach generalizes the notion of Quantum Neural Tangent Kernel, which has been used to study the dynamics of classical and quantum machine learning models. The Quantum Path Kernel exploits the parameter trajectory, i.e. the curve delineated by model parameters as they evolve during training, enabling the representation of differential layer-wise convergence behaviors, or the formation of hierarchical parametric dependencies, in terms of their manifestation in the gradient space of the predictor function. We evaluate our approach with respect to variants of the classification of Gaussian XOR mixtures - an artificial but emblematic problem that intrinsically requires multilevel learning in order to achieve optimal class separation.
translated by 谷歌翻译
Few-shot learning (FSL) is a central problem in meta-learning, where learners must efficiently learn from few labeled examples. Within FSL, feature pre-training has recently become an increasingly popular strategy to significantly improve generalization performance. However, the contribution of pre-training is often overlooked and understudied, with limited theoretical understanding of its impact on meta-learning performance. Further, pre-training requires a consistent set of global labels shared across training tasks, which may be unavailable in practice. In this work, we address the above issues by first showing the connection between pre-training and meta-learning. We discuss why pre-training yields more robust meta-representation and connect the theoretical analysis to existing works and empirical results. Secondly, we introduce Meta Label Learning (MeLa), a novel meta-learning algorithm that learns task relations by inferring global labels across tasks. This allows us to exploit pre-training for FSL even when global labels are unavailable or ill-defined. Lastly, we introduce an augmented pre-training procedure that further improves the learned meta-representation. Empirically, MeLa outperforms existing methods across a diverse range of benchmarks, in particular under a more challenging setting where the number of training tasks is limited and labels are task-specific. We also provide extensive ablation study to highlight its key properties.
translated by 谷歌翻译
Tree-based machine learning algorithms provide the most precise assessment of the feasibility for a country to export a target product given its export basket. However, the high number of parameters involved prevents a straightforward interpretation of the results and, in turn, the explainability of policy indications. In this paper, we propose a procedure to statistically validate the importance of the products used in the feasibility assessment. In this way, we are able to identify which products, called explainers, significantly increase the probability to export a target product in the near future. The explainers naturally identify a low dimensional representation, the Feature Importance Product Space, that enhances the interpretability of the recommendations and provides out-of-sample forecasts of the export baskets of countries. Interestingly, we detect a positive correlation between the complexity of a product and the complexity of its explainers.
translated by 谷歌翻译
We develop Bayesian neural networks (BNNs) that permit to model generic nonlinearities and time variation for (possibly large sets of) macroeconomic and financial variables. From a methodological point of view, we allow for a general specification of networks that can be applied to either dense or sparse datasets, and combines various activation functions, a possibly very large number of neurons, and stochastic volatility (SV) for the error term. From a computational point of view, we develop fast and efficient estimation algorithms for the general BNNs we introduce. From an empirical point of view, we show both with simulated data and with a set of common macro and financial applications that our BNNs can be of practical use, particularly so for observations in the tails of the cross-sectional or time series distributions of the target variables.
translated by 谷歌翻译
数据的表示对于机器学习方法至关重要。内核方法用于丰富特征表示,从而可以更好地概括。量子内核有效地实施了在量子系统的希尔伯特空间中编码经典数据的有效复杂的转换,甚至导致指数加速。但是,我们需要对数据的先验知识来选择可以用作量子嵌入的适当参数量子电路。我们提出了一种算法,该算法通过组合优化过程自动选择最佳的量子嵌入过程,该过程修改了电路的结构,更改门的发生器,其角度(取决于数据点)以及各种门的QUBIT行为。由于组合优化在计算上是昂贵的,因此我们基于均值周围的核基质系数的指数浓度引入了一个标准,以立即丢弃任意大部分的溶液,这些溶液被认为性能较差。与基于梯度的优化(例如可训练的量子内核)相反,我们的方法不受建筑贫瘠的高原影响。我们已经使用人工和现实数据集来证明相对于随机生成的PQC的方法的提高。我们还比较了不同优化算法的效果,包括贪婪的局部搜索,模拟退火和遗传算法,表明算法选择在很大程度上影响了结果。
translated by 谷歌翻译
时间序列预测是一个重要的问题,具有许多现实世界的应用。深度神经网络的合奏最近实现了令人印象深刻的预测准确性,但是在许多现实世界中,如此大的合奏是不切实际的。变压器模型已成功应用于各种具有挑战性的问题。我们建议对原始变压器体系结构进行新颖的改编,重点是时间序列预测的任务,称为持久性初始化。该模型通过使用与残留跳过连接的乘法门控机制初始化为幼稚的持久性模型。我们使用具有REZERO标准化和旋转位置编码的解码器变压器,但适应适用于任何自动回归神经网络模型。我们评估了有关挑战性M4数据集的拟议体系结构,与基于合奏的方法相比,取得了竞争性能。我们还将最近提议的变压器模型进行比较,以预测时间序列,显示了M4数据集中的卓越性能。广泛的消融研究表明,持久性初始化会导致更好的性能和更快的收敛性。随着模型的大小的增加,只有我们提出的适应性增长的模型。我们还进行了一项额外的消融研究,以确定正常化和位置编码的选择的重要性,并发现旋转编码的使用和REZERO归一化对于良好的预测性能至关重要。
translated by 谷歌翻译